Exploiting Data Locality in Adaptive Architectures

نویسندگان

  • DAN WALLIN
  • Dan Wallin
  • Erik Hagersten
چکیده

The speed of processors increases much faster than the memory access time. This makes memory accesses expensive. To meet this problem, cache hierarchies are introduced to serve the processor with data. However, the effectiveness of caches depends on the amount of locality in the application’s memory access pattern. The behavior of various programs differs greatly in terms of cache miss characteristics, access patterns and communication intensity. Therefore a computer built for many different computational tasks potentially benefits from dynamically adapting to the varying needs of the applications. This thesis shows that a cc-NUMA multiprocessor with data migration and replication optimizations efficiently exploits the temporal locality of algorithms. The performance of the self-optimizing system is similar to a system with a perfect initial thread and data placement. Data locality optimizations are not for free. Large cache line coherence protocols improve spatial locality but yield increases in false sharing misses for many applications. Prefetching techniques that reduce the cache misses often lead to increased address and data traffic. Several techniques introduced in this thesis efficiently avoid these drawbacks. The bundling technique reduces the coherence traffic in multiprocessor prefetchers. This is especially important in snoop-based systems where the coherence bandwidth is a scarce resource. Bundled prefetchers manage to reduce both the cache miss rate and the coherence traffic compared with non-prefetching protocols. The most efficient bundled prefetching protocol studied, lowers the cache misses by 27 percent and the address snoops by 24 percent relative to a non-prefetching protocol on average for all examined applications. Another proposed technique, capacity prefetching, avoids false sharing misses by distinguishing between cache lines involved in communication from non-communicating cache lines at run-time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallelization and Locality Analysis for Adaptive Computing Systems

This paper presents a strategy for compiling to adaptive computing architectures systems that incorporate configurable logic devices such as FPGAs. As compared to conventional instruction set architectures, adaptive computing systems offer the opportunity to customize the logic according to the requirements of each application. In this paper, we focus on a particular aspect of customizing the l...

متن کامل

Exploiting Data Locality on Scalable

OpenMP ooers a high-level interface for parallel programming on scalable shared memory (SMP) architectures providing the user with simple work-sharing directives while relying on the compiler to generate parallel programs based on thread parallelism. However, the lack of language features for exploiting data locality often results in poor performance since the non-uniform memory access times on...

متن کامل

Exploiting the locality of data structures in multithreaded architecture

Multithreaded architectures taking a hybrid approach of von Neumann computers and dataaow computers are recently in active research. Multithreaded architectures can improve the performance by the locality exploitation within thread and asynchronous parallel execution among threads. However, it has been overlooked how to exploit eeectively the locality of large shared data structures among threa...

متن کامل

Exploiting Data Transfer Locality in Memory Mapping

System-level exploration of memory architectures is one of the key issues in successful implementation of datatransfer dominated applications. Usually, one of the main design bottlenecks is the memory access bandwidth. Transformations, rearranging the layout of the data records stored in memory, are very effective to improve the locality of the data transfers but usually lead to a large memory ...

متن کامل

Using the Adaptive Frequency Nonlinear Oscillator for Earning an Energy Efficient Motion Pattern in a Leg- Like Stretchable Pendulum by Exploiting the Resonant Mode

In this paper we investigate a biological framework to generate and adapt a motion pattern so that can be energy efficient. In fact, the motion pattern in legged animals and human emerges among interaction between a central pattern generator neural network called CPG and the musculoskeletal system. Here, we model this neuro - musculoskeletal system by means of a leg - like mechanical system cal...

متن کامل

A Case for Fine-Grain Adaptive Cache Coherence

As transistor density continues to grow geometrically, processor manufacturers are already able to place a hundred cores on a chip (e.g., Tilera TILE-Gx 100), with massive multicore chips on the horizon. Programmers now need to invest more effort in designing software capable of exploiting multicore parallelism. The shared memory paradigm provides a convenient layer of abstraction to the progra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003